home *** CD-ROM | disk | FTP | other *** search
- .* My thanks to Dick Dievendorff for fixing up this file to format with
- .* GML. Now, if someone could do the same with SCLIB DOC . . .
- :gdoc.
- :frontm.
- :titlep.
- :title.Small-c Version 2.0
- :title.for the IBM Personal Computer
- :title.Release 1.01
- :date.
- :author.Daniel R. Hicks
- :address.
- :aline.
- :aline.RCH38DB(HICKS)
- :eaddress.
- :etitlep.
- :preface.
- :p.Portions of Small-c code Copyright 1982 by J. E. Hendrix.
- :p.Converted to the IBM Personal Computer by D. R. Hicks.
- :p.This document was developed entirely by D. R. Hicks, and is
- :hp1.not:ehp1. copyrighted.
- :body.
- :h1.Why Small-c?
- :p.
- Why bother with a limited version of the C programming language when
- complete versions of the language are available? The main reason is
- price. The cheapest "full" C implementations run about $75, and
- prices increase from there up to the $500 range. Small-c is free, and
- this means that many users who would not want to pay for a full
- implementation will have access to this one. Other users who may not
- be sure that they want pay for a full C implementation can use this
- one to determine if they like the language.
- :p.
- Another reason for the existence of Small-c is control: You get all
- of the source for Small-c, and you can, if you wish, maintain or
- modify it yourself. You can also change the I/O support package to
- suit your needs, and you can even port the compiler to a different
- processor if you wish (this version is converted from an 8080
- version).
- :h1.What can Small-c do?
- :p.
- Small-c takes a source file, consisting of C-subset statements, and
- creates an assembler source file. This assembler source file can be
- processed by either the IBM Small Assembler or the IBM Macro Assembler
- to produce an IBM/Intel format OBJ file. The OBJ file can, in turn,
- be combined with other OBJ files and LIB files via the DOS LINK
- function to produce an executable EXE module. An I/O & utility
- function library is provided for this purpose, implementing an
- environment which quite closely mimics the UNIX environment.
- :p.
- This version of Small-c can be executed on an IBM PC with 128K of
- storage, one single-sided diskette drive, and running IBM DOS 1.1 or
- 2.0. In all likelihood, the compiler will execute on a 96K system,
- but this has never been tried. Since at least 96K is needed to run
- either of the assemblers, there is little merit in executing in less
- than 96K. Although dual diskette drives are necessary for large
- compilations (where the assembler source can occupy most of one
- diskette), they are useful but not essential otherwise. Where a hard
- file is available, this may be used in place of diskettes. The
- compiler runs equally well on DOS 1.1 and DOS 2.0, although it does
- not utilize the increased function of DOS 2.0. Release 1.01 has added
- up to 2 dimensional arrays of all supported data types (D. Lang 12/90)
- :p.
- Programs created with the compiler (and suitably assembled and linked)
- can be executed on any IBM DOS system with sufficient storage.
- Normally, the programs must be linked with the associated Small-c
- library which, in addition to providing I/O support, sets up the
- operating environment. However, a reasonably adept programmer should
- be able to link Small-c procedures with procedures from other
- compilers, if necessary coding assembler "glue" via the the Small-c
- "#asm" statement.
- :p.
- Although it has never (to my knowledge) been attempted, there is no
- known reason why this version of Small-c (and the programs it
- compiles) should not execute equally well on any suitably configured
- IBM PC-compatible MS DOS system.
- :h1.What WON'T it do?
- :p.
- Small-c was originated by Ron Cain, and was originally published in
- Dr. Dobb's Journal, number 45. The original intent was to provide
- users of small microprocessor systems with a compact yet powerful
- systems programming language, one which could be both used and
- maintained on a small system. For this reason, many of the features
- of full C implementations were omitted, resulting in a restricted but
- usable language. Small-c Version 2.0 was developed by J. E. Hendrix
- and published in Dr. Dobb's Journal, numbers 74 & 75. This version,
- slightly larger to account for the tendency toward larger systems, has
- fewer restrictions and is considerably more powerful. Nonetheless, it
- has significant restrictions relative to full C implementations. The
- following are the principle restrictions:
- :ol.
- :li.Structures and unions are not supported.
- :li.Up to 2 dimensions arrays supported in Release 1.01
- :li.Floating point is not supported.
- :li."Long" integers are not supported.
- :li.Only functions returning integers are supported.
- :li.Pointers to pointers, arrays of pointers, and several of the other
- exotic forms of declarations are not supported.
- :eol.
- :h1.Running Small-c
- :p.
- The compiler may be invoked with the parameters specified on the
- command line, with prompting for the parameters, or a combination of
- the two. To invoke the compiler with prompting, type:
- :sl.
- :li.CC
- :esl.
- :pc.The compiler will display the prompt:
- :sl.
- :li.Input file [CON.C]:
- :esl.
- :pc.Type in the name of the file containing your C program. A
- filename extension of "C" is assumed. If you enter no filename, input
- will be accepted from the keyboard.
- :p.The compiler will then display the prompt:
- :sl.
- :li.Output file [CON.ASM]:
- :esl.
- :p.
- Type in the file name of the file to contain the assembler source. A
- filename identical to the source file name and an extension of "ASM"
- is assumed.
- :p.
- The compiler will then display the prompt:
- :sl.
- :li.Listing file [NUL.LST]:
- :esl.
- :p.
- Normally, the listing file is either suppressed (by routing to the NUL
- device) or is routed to the printer (by specifying "prn", for
- example). If no filename is specified, NUL is assumed.
- :pc.
- The compiler then asks the following yes/no questions:
- :sl.
- :li.Interleave C source?
- :li.Monitor function headers?
- :li.Sound alarm on errors?
- :li.Pause on errors?
- :esl.
- :p.
- "Interleave C source" means to place the C source statements into the
- assembler source as comments. This is very useful if the assembler
- source is to be examined or modified after compilation, but it
- increases the size of the assembler source file (by slightly more than
- the size of the C source file).
- :p.
- "Monitor function headers" means to display each function header (and
- include statement) as it is processed. This is useful both for
- monitoring the progress of the compilation and for interpreting the
- context of error messages from the compiler. This question is not
- asked (and function headers and include statements are not displayed)
- if output is being routed to the display screen.
- :p.
- "Sound alarm on errors" means to sound the PC's "bell" (beeper)
- whenever an error message is displayed (also, when the compilation
- ends). This is useful for long compilations where one might wish to
- leave the computer unattended but be alerted when attention was
- needed.
- :p.
- "Pause on errors" means to stop after displaying each error message
- and wait for an "ENTER" before proceeding. This prevents the error
- message from being scrolled off the screen before it can be examined.
- :p.
- After these questions have been answered, the compilation will begin.
- C source statements will be read from the input file and written to
- the output file. Function headers and include statements will be
- displayed if selected. When the compilation is finished, the
- following message is displayed (where n is a number):
- :sl.
- :li.There were n errors in this compilation
- :esl.
- :p.
- If the number of errors is zero (or if you are willing to accept
- whatever the errors are that have occurred), you may process the
- produced assembler source with ASM (IBM Small Assembler) or MASM (IBM
- Macro Assembler). Once the program has been successfully compiled and
- assembled, it should be linked, using the DOS LINK program, with the
- CC.LIB library supplied, thereby producing the executable EXE file.
- :p.
- When parameters are specified via the DOS command line, the file names
- must be entered in order -- source file, assembler file, listing file
- -- followed by or interspersed with the processing options. Roughly,
- the syntax is:
- :sl.
- :li.CC [<source_file> [<assembler_file> [<listing_file>
- ]]][-[n]<option> . . .][;]
- :esl.
- :p.
- Each file name, option, or the ";", must be separated by spaces from
- adjacent entries. The options are identified by a leading "-" and
- contain an optional "n" which indicates the "not" of the option. The
- options are:
- :sl.
- :li.i -- interleave C source
- :li.m -- monitor function headers
- :li.a -- sound alarm on errors
- :li.p -- pause on errors
- :esl.
- :p.
- The ";" indicates that defaults are to be taken for the remaining
- options or file names (rather than prompting for them).
- :h1.Data representations
- :p.
- Small-c recognizes seven different data types:
- :sl.
- :li.Integers
- :li.Characters
- :li.Integer arrays
- :li.Character arrays
- :li.Integer pointers
- :li.Character pointers
- :li.Integer functions
- :esl.
- :pc.
- No other combinations are recognized.
- :p.
- Integers are signed binary numbers ranging between -32768 and +32767
- and occupying 16 bits (two bytes) of data storage on a byte boundary.
- The low-order eight bits of the number is stored in the byte with the
- lower address.
- :p.
- Characters are signed binary numbers ranging between -128 and +127 and
- occupying 8 bits (one byte) of data storage on a byte boundary.
- :p.
- Integer arrays can be up to a maximum of 2 dimensions of 16 bit
- signed integers. Each element occupies two bytes of data storage on a
- byte boundary, with the low-order eight bits of each element occupying
- the byte with the lower address. The first element (the one
- corresponding to an index value of zero) occupies the two-byte area
- with the lowest address, the next element occupies the adjacent higher
- address, etc.
- :p.
- Character arrays can be up to a maximum of 2 dimensions of 8 bit
- signed integers. Each element occupies one byte of data storage on a
- byte boundary. The first element (the one corresponding to an index
- value of zero) occupies the one-byte area with the lowest address, the
- next element occupies the adjacent higher address, etc.
- :p.
- Integer and character pointers are both 16 bit unsigned numbers,
- ranging between 0 and 65535 and occupying two bytes of data storage on
- a byte boundary. Like integers, the low-order eight bits of the
- number is stored in the byte with the lower address. The only
- distinction between the two is in the semantics of their use:
- Dereferencing (via "*") an integer pointer causes a two byte quantity
- to be fetched or stored, while dereferencing a character pointer
- causes a one byte quantity to be fetched and stored. Similarly,
- incrementing (decrementing) an integer pointer causes two to be added
- to (subtracted from) the pointer, while incrementing (decrementing) a
- character pointer causes one to be added to (subtracted from) the
- pointer.
- :p.
- Functions are variable length areas of code storage beginning on a
- byte boundary. Functions are presumed to consist of 8088 instructions
- arranged in a sequence which is meaningful and which conforms to
- Small-c linkage and usage protocols.
- :p.
- The C language is fairly unique among "modern" languages in that the
- value of a named entity (i.e., variable or function) is the contents of
- the entity only when the entity is a scalar. In other cases, the
- value of the entity is its address. Thus, the value of an integer or
- character array is the offset in the data segment to the first element
- (the one with an index of zero) and the value of a function (when its
- name is not followed by a parameter list) is the offset in the code
- segment to the entry point of the function.
- :p.
- Since C is a loosely-typed language, much of the semantics of a named
- entity is dependent upon the context of its use. Any of the above
- data types may be dereferenced with a "*", for instance (although
- dereferencing a character variable or a function name is guaranteed to
- be meaningless -- and dangerous). Likewise, any of the above data
- types may be treated as a function and called by appending a parameter
- list. (In this case, character variables and array names are
- meaningless -- and even more dangerous.)
- :p.
- Another unusual aspect of C is that character values are extended to
- integer values when used in an expression, when passed as an argument,
- or when referenced by a switch or return statement. The C language
- standard permits this conversion to be done either via sign extension,
- or by filling the high-order bits of the integer with zeros. Small-c
- uses sign extension. Thus an expression such as:
- :sl.
- :li.c == 255
- :esl.
- :pc.(where c is a character variable) will never be true.
- :h1.Language features and restrictions
- :p.
- In addition to the restrictions stated above, the following
- restrictions hold:
- :sl.
- :li.Lower-case and upper-case symbols are synonymous.
- :li.Local declarations at a block level and goto statements may not be
- used in the same function.
- :li.The sizeof operator is not supported.
- :li.The cast operator is restricted to four cases:
- :sl.
- :li.(int)
- :li.(char)
- :li.(int *)
- :li.(char *)
- :esl.
- :pc.No extraneous blanks are allowed within the cast operator.
- :li.Initializers are only permitted on global or static declarations,
- and only literal values may be used for initializers.
- :esl.
- :h1.Storage and linkage conventions
- :p.
- This version of Small-c utilizes the "small" storage model: The CS
- register addresses a single code segment (of up to 64K), while the DS,
- ES, and SS registers address a single data segment (also of up to
- 64K). The code segment comes first (lowest) in storage, followed
- immediately by the data segment. Static storage and string constants
- are allocated at the low end of the data segment, with a heap growing
- up from the top of the statics, and a stack growing down from the top
- of the data segment. The initialization code contained in C.LIB
- initializes the data segment to be as large as available storage, up
- to the 64K maximum. Although the function is so far unused, provision
- is made for allocating I/O buffers above the data segment when extra
- storage is available. In theory, the above allows a single program to
- utilize up to 128K of storage, although, in practice, one will usually
- use up the code segment roughly twice as fast as the data segment, so
- that 96K (not counting DOS) is a more practical limit.
- :p.
- Subroutine linkage and automatic storage allocation utilizes the
- stack. To call a subroutine, parameters are first pushed onto the
- stack. In standard C fashion, scalar parameters (char or int) are
- passed by value, while arrays and strings are passed by address. Char
- and int values both occupy two bytes on the stack. Char values are
- are sign-extended to 16 bits. Address values are also passed as
- two-byte quantities: The interpretation of the address as an offset
- into the code segment or and offset into the data segment depends on
- the way it is used. (In fact, Small-c does not recognize pointers to
- functions as a data type, but reference to a function name without
- following "()" yields its offset into the code segment, and reference
- to a variable followed by "()" results in a call to the location in
- the code segment indicated by the variable.)
- :p.
- Parameters are pushed in order of
- occurrence: The first parameter in
- a list is the first one pushed and therefore the deepest one in the
- stack. This is opposite the order of many C compilers, and it
- prevents some C library functions (such as "printf") from being able
- to determine how many parameters are present by examining the first or
- second one. For this reason, the compiler, prior to a CALL, loads
- register DL with the parameter count, thus allowing functions such as
- "printf" to be implemented. This feature (the loading of DL) can be
- disabled in cases where it is not needed, thereby generating more
- compact code.
- :p.
- After the parameters have been pushed into the stack, the function is
- called (as a NEAR procedure). This causes the return address to be
- pushed onto the stack. The called function then saves the current BP
- register value by pushing it onto the stack, loads BP from the current
- SP register value, and increments SP by the size of automatic storage
- needed. Thus, automatic storage can be addressed by using negative
- offsets from BP, while parameters can be addressed using positive
- offsets.
- :p.
- To return from a function, the result, if any is first loaded into the
- AX register. Then, SP is loaded from BP (to pop any automatic storage
- present), and the old value of BP is restored by popping it from the
- stack. Finally, a NEAR return is executed which pops the saved return
- address from the stack and returns to the calling function. It is the
- responsibility of the calling routine to pop the parameters from the
- stack (this is an area of possible incompatibility with other
- languages).
- :p.
- Although the effect can be negated by the inappropriate use of global
- or static variables, all code generated by this compiler is reentrant.
- :h1.Execution environment
- :p.
- It is the intent of this implementation to mimic the standard UNIX C
- operating environment as accurately as possible and reasonable, given
- the limitations of Small-c and the difference in operating systems.
- Thus, :hp1.main:ehp1. is passed a pair of parameters, as in UNIX C,
- with the first parameter indicating the number of arguments, and the
- second parameter pointing to an array of integers which can be cast
- into character pointers to the arguments. These arguments are the
- blank-separated arguments parsed from the DOS command line, except
- that the first argument (which in UNIX C is the name of the program)
- is always "main".
- :p.
- Also optionally parsed from the command line are redirections of
- :hp1.stdin:ehp1. and :hp1.stdout:ehp1.. If an argument on the command
- line is preceeded by the character "<", it is treated as the file name
- for stdin rather than being added to the argument list. And if an
- argument on the command line is preceded by the character ">", it is
- treated as the file name for stdout rather than being added to the
- argument list. (Caution: Under DOS 2.0, these arguments are
- intercepted and the redirection is performed by DOS rather than by
- C.LIB. Unfortunately, there are bugs in the DOS 2.0 redirection
- support.) If not redirected, stdin, stdout, and stderr all refer to
- the keyboard/display. stderr is not redirectable.
- :h1.I/O system
- :p.
- It is the intent of this implementation to mimic the standard UNIX C
- I/O interfaces as accurately as possible and reasonable, given the
- limitations of Small-c and the difference in environment. Thus, both
- I/O layers are implemented: The "standard I/O" layer and the UNIX
- system call layer. The principle difference between the two layers is
- not so much function as it is of performance: The standard I/O
- interface provides an extra level of buffering which is useful to many
- applications which process input or generate output a character at a
- time, while the UNIX system call interface is more efficient when this
- extra buffering is not needed. In addition, the UNIX system call
- interface permits slightly more precise control over the I/O, while
- the standard I/O interface functions are slightly easier to use when
- processing character data.
- :p.
- As with UNIX C, access to the "standard I/O" layer generally requires
- the user to include stdio.h, so that various values are defined and
- various externals are declared. Also as with UNIX C, the include
- errno.h contains error codes that are stored in the external variable
- errno by the I/O routines.
- :p.
- The UNIX system call layer reserves the filenames "kbd", "scrn",
- "con", and "prn" (all lower case) and treats files opened to these
- names specially. The first three names are intercepted and routed to
- the appropriate keyboard/ display interfaces, while the fourth is
- routed to the printer. All other file names are passed to DOS to be
- treated as DOS disk/diskette file names or DOS reserved file names.
- :p.
- One unusual aspect of this I/O system is that it provides a "bridge"
- between the UNIX C file scheme which uses newline as a record
- terminator, and the IBM/MS-DOS file scheme which uses Carriage
- Return/Line Feed as a record terminator. This conversion is
- implicitly performed for all I/O. If non-character data is to be
- processed, this conversion must be disabled via :hp1.fbinary:ehp1.
- or :hp1._ioctl:ehp1..
- :p.
- The implemented portions of the two levels of I/O interfaces are
- described in a separate document. In general, the standard I/O
- functions have the same names as their UNIX counterparts, while the
- UNIX system call functions have underline ("_") prefixed on their
- names. For more detail, users are advised to refer to
- :cit.The Unix System:ecit. by S. R. Bourne, or
- :cit.UNIX Programmer's Manual:ecit. by Bell Laboratories.
- :h1.Debugging
- :p.
- This Small-c implementation contains very little in the way of
- debugging aids. In many cases, this is of no significance, since the
- C language has a fascinating tendency to produce programs which run
- without error on the first attempt. However, since even the best
- programmer will eventually produce errors if his programs are large
- enough, some debugging facilities and techniques are usually
- necessary. The standard IBM/MS-DOS facilities, combined with common
- debugging techniques (such as debug "print" statements) and the few
- facilities provided by the language, have proved adequate for
- debugging thus far.
- :p.
- Someone who plans to use the compiler extensively should take some
- time to familiarize himself with code generated by the compiler. This
- will make it easier to associate code sequences from DEBUG
- "unassemble" with the correct C source statements, thus reducing or
- eliminating the need to obtain assembler listings.
- :p.
- When using LINK to produce a C program's EXE module, it is a good idea
- to obtain a link MAP if debugging of the EXE module is likely to be
- necessary. To do this, respond to the LINK "List File" prompt with
- "PRN" or a file name AND specify the "/m" option so that the map is
- really produced, rather than just obtaining a list of segment sizes.
- This map can then be used when dumping global variables or setting
- DEBUG breakpoints.
- :egdoc.
-